LR Recursive Transition Networks for Earley and Tomita Parsing
نویسنده
چکیده
Efficient syntactic and semantic parsing for ambiguous context-free languages are generally characterized as complex, specialized, highly formal algorithms. In fact, they are readily constructed from straightforward recursive Iransition networks (RTNs). In this paper, we introduce LR-RTNs, and then computationally motivate a uniform progression from basic LR parsing, to Earley's (chart) parsing, concluding with Tomita's parser. These apparently disparate algorithms are unified into a single implementation, which was used to automatically generate all the figures in this paper. 1. I N T R O D U C T I O N Ambiguous context-free grammars (CFGs) are currently used in the syntactic and semantic processing of natural language. For efficient parsing, two major computational methods are used. The first is Earley's algorithm (Earley, 1970), which merges parse trees to reduce the computational dependence on input sentence length from exponential to cubic cost. Numerous variations on Earley's dynamic programming method have developed into a family of chart parsing (Winograd, 1983) algorithms. The second is Tomita's algorithm (Tomita, 1986), which generalizes Knuth's (Knuth, 1965) and DeRemer's (DeRemer, 1971) computer language LR parsing techniques. Tomita's algorithm augments the LR parsing "set of items" construction with Earley's ideas. What is not currently appreciated is the continuity between these apparently distinct computational methods. • Tomita has proposed (Tomita, 1985) constructing his algorithm from Earley's parser, instead of DeRemer's LR parser. In fact, as we shall show, Earley's algorithm may be viewed as one form of LR parsing. • Incremental constructions of Tomita's algorithm (Heering, Klint, and Rekers, 1990) may similarly be viewed as just one point along a continuum of methods. * This work was supported in part by grant R29 LM 04707 from the National Library of Medicine, and by the Pittsburgh NMR Institute. The apparent distinctions between these related methods follows from the distinct complex formal and mathematical apparati (Lang, 1974; Lang, 1991) currently employed to construct these CF parsing algorithms. To effect a uniform synthesis of these methods, in this paper we introduce LR Recursive Transition Networks (LR-RTNs) as a simpler framework on which to build CF parsing algorithms. While RTNs (Woods, 1970) have been widely used in Artificial Intelligence (AI) for natural language parsing, their representational advantages have not been fully exploited for efficiency. The LR-RTNs, however, are efficient, and shall be used to construct" (1) a nondeterministic parser, (2) a basic LR(0) parser, (3) Earley's algorithm (and the chart parsers), and (4) incremental and compiled versions of Tomita's algorithm. Our uniform construction has advantages over the current highly formal, non-RTN-based, nonuniform approaches to CF parsing: • Clarity of algorithm construction, permitting LR, Earley, and Tomita parsers to be understood as a family of related parsing algorithm. • Computational motivation and justification for each algorithm in this family. • Uniform extensibility of these syntactic methods to semantic parsing. • Shared graphical representations, useful in building interactive programming environments for computational linguists. • Parallelization of these parsing algorithms. • All of the known advantages of RTNs, together with efficiencies of LR parsing. All of these improvements will be discussed in the paper. 2. L R R E C U R S I V E T R A N S I T I O N N E T W O R K S A transition network is a directed graph, used as a finite state machine (Hopcroft and Ullman, 1979). The network's nodes or edges are labelled; in this paper, we shall label the nodes. When an input sentence is read, state moves from node to node. A sentence is accepted if reading the entire sentence directs the network traversal so as to arrive at an
منابع مشابه
A New Approach to the Construction of Generalized LR Parsing Algorithms
LR parsing strategies can analyze LR grammars, which are deterministic. If we consider LR parsing tables in which each entry can contain several actions, we obtain non-deterministic LR parsing, often known as generalized LR parsing, which can analyze non-deterministic context-free grammars. It this context, some mechanism is needed in order to represent the non-deterministic evolution of the st...
متن کاملParsing methods streamlined
This paper has the goals (1) of unifying top-down parsing with shift-reduce parsing to yield a single simple and consistent framework, and (2) of producing provably correct parsing methods, deterministic as well as tabular ones, for extended context-free grammars (EBNF ) represented as state-transition networks. Departing from the traditional way of presenting as independent algorithms the dete...
متن کاملGeneralised LR parsing algorithms
This thesis concerns the parsing of context-free grammars. A parser is a tool, defined for a specific grammar, that constructs a syntactic representation of an input string and determines if the string is grammatically correct or not. An algorithm that is capable of parsing any context-free grammar is called a generalised (contextfree) parser. This thesis is devoted to the theoretical analysis ...
متن کاملPolynomial Time and Space Shift-Reduce Parsing of Arbitrary Context-free Grammars
We introduce an algorithm for designing a predictive left to right shift-reduce non-deterministic push-down machine corresponding to an arbitrary unrestricted context-free grammar and an algorithm for efficiently driving this machine in pseudo-parallel. The performance of the resulting parser is formally proven to be superior to Earley's parser (1970). The technique employed consists in constru...
متن کاملA Faster Earley Parser
We present a parsing technique which is a hybrid of Earley’s method and the LR(k) methods. The new method retains the ability of Earley’s method to parse using arbitrary context-free grammars. However, by using precomputed LR(k) sets of items, we obtain much faster recognition speeds while also reducing memory requirements.
متن کامل